The QRZ! Ham Radio CDROM Callsign Database Technical Specification Rev C October 1996 The following information is provided for developers who wish to write their own software to directly access the QRZ callsign database files. Windows programmers including Visual C++ and Visual Basic users should refer to the help file called QRZDLL.HLP which documents database access through our custom dynamic link libraries, QRZDLL.DLL (for win3) and QRZ32.DLL (for Win95/NT). Users of the QRZ! Ham Radio CDROM who wish to write their own callsign database search and retrieval software are encouraged to do so. We welcome user contributed shareware programs for future versions of the QRZ! Ham Radio CDROM. Overview There are three versions of the QRZ software for the PC, all of which share a common architecture. Separate versions are provided for DOS, Windows 3.1 and Windows 95 and/or Windows NT. All of the programs access the database using the same method. The QRZ callsign database indexing and retrieval method was designed and optimized for CDROM use. The primary goal was to provide fast searches for the most commonly sought after information. A key to this strategy is the caching of index information in memory to minimize reads and more importantly, seeks from the CDROM drive. The method described below implements one such strategy and has been shown to require only one CDROM head seek per database lookup. Database Structure The QRZ callsign database is composed of four separate copies of the data, and four indices, each of which is sorted by different criteria. One copy is sorted by callsign, one by last name, one by city/state and one by zip code. Due to differences in the way foreign addresses are represented, many DX countries are not represented in the City/State and Zip code databases. These countries are generally searchable by callsign and/or name only. Data file Index file Database type ----------------------------------------------------- callbkc.dat callbkc.idx Callsign callbkn.dat callbkn.idx Name callbks.dat callbks.idx State and City callbkz.dat callbkz.idx Zip Code All of the database files are located in the directory \CALLBK on the CDROM. Each of the four datafiles (*.dat) is accompanied by a corresponding index file (*.idx). The index files contain selected keys from their corresponding databases which were selected by sampling the databases at regular file offset intervals. The sampling intervals are chosen to produce indices that are no more than 64 Kbytes in length so that they can each be contained within 64 Kb memory segments under DOS and Window 3.x. The same indices are used in the Win32 environment despite the fact that there are no 64 Kbyte segment constraints, to preserve compatibility with Win3 and DOS programs. The sampling interval for the index keys is subject to change from one release of the QRZ CDROM to the next and is therefore recorded as one of the critical operating parameters in the header of each index file. This value, referred to as BytesPerKey, must be treated by your program as dynamic and must be fetched from each of the index headers at the start of each session. A different BytesPerKey values is used for each database. The index header occupies the first 48 bytes of the *.idx file and has the following format: /* ** Index Header Block Definition (Version 2) ** (applies to all QRZ CDROMS from Version 2 onward) ** ** This block is located at the start of each index file */ typedef struct { char DataName[16]; /* Name of the data file */ char BytesPerKey[8]; /* Data Bytes per Index Item */ char NumKeys[8]; /* Number of items in this index */ char KeyLen[8]; /* Length of each key item in bytes */ char Version[8]; /* Database Version ID */ } index_header; All values in the index header block are stored in ASCII character representation. These characters must be converted (by your program) into string or integer values as necessary. Characters are left justified within each field and unused field characters (if any) are zero filled. Your program should not depend on the presence of the null characters when reading these fields since some values could legitimately fill the entire field. Index Data Formats Some number of keys (noted by the NumKeys field) immediately follow the index header block in the index file. All fields in a given index file will have a width (in bytes) of 'KeyLen'. The name index (CALLBKN.IDX) uses uniform keys which are set to a maximum of 'KeyLen' characters per name. Longer names are simply truncated. Names are stored in last-first format with a space between the two parts. The city/state index (CALLBKC) also uses 'KeyLen' characters per entry with the two character state code occupying the first two characters and the city name in the last 10 characters. For example, the town of Fremont, CA is represented as CAFREMONT in the index. Callsigns (in CALLBKC.IDX) each occupy a different 'KeyLen' width slot (typically 6 characters wide) and zip code indices (CALLBKZ.IDX) do the same using a typical KeyLen value of 5. Your program must always interpret KeyLen, BytesPerKey, and NumKeys and never make assumptions regarding their sizes. These sizes could change in a future edition of the database and your program must be prepared to deal with it. Using the Index Header Block The header block describes the field data which immediately follows it. The records are tightly packed on 'BytesPerKey' boundaries without separators making them ideal candidates for use as memory arrays. Although unused key fields will be zero filled to the right, there is no guarantee that any given field will be null terminated. Because of this, the indices must always be searched in a random access, fixed record length format. A typical program will first search for the system drive that contains the \CALLBK base directory. Next, the program will open and load each of the four indices into four separate 64 Kb memory buffers. Searching the indices is then performed by addressing the buffers as one-dimensional arrays which contain 'NumKeys' elements that are each 'KeyLen' bytes wide. A search for a particular item starts with the user inputting a desired key which is then formatted into an index key value. The program then uses this key value to locate the closest match in the index table which is less than or equal to the the user supplied key. For most machines a simple linear search of the table will be fast enough however a binary search algorithm can be employed. After the relevant table key is chosen, it's ordinal position from the start of the table is saved in a variable called KeyOffset. Next, the program must multiply the KeyOffset value by the BytesPerKey value which yields a DataOffset value. This DataOffset value is then used as an index into the actual datafile (*.dat). Typically, a program will use the DataOffset value as an argument to a File Seek system call ( fseek() ). Once the file pointer is positioned at the DataOffset, the program can then begin a linear search for the desired record in the database. Again, a binary search between the [DataOffset] and [KeyOffset+1*BytesPerKey] can be used however experience has shown this will provide only a minimal improvement in performance. Be aware that the derived DataOffset value will usually land you in the middle of some record. This is typical and you will find that the callsign that was pointed to by the index key will be located at the beginning of the next text line in the file. The data file is an ordinary ASCII text file with a single newline (0x0a) character at the end of each line. The search of the data file should terminate at offset [KeyOffset+1 * BytesPerKey] if the desired record has not yet been found. Database Format The database files all have the same format. They are ASCII files which consist of one text line per record. Each record consists of a fixed number of comma separated fields with blank fields represented by consecutive commas. Each line is terminated with a single ASCII newline ('\n', 0x0a, or chr$(10)) character. Every record has the same number of commas in it, except for cross-reference records, which are discussed below. If the data itself is supposed to contain a comma, then it is represented in the database by a semi-colon ';' which should be replaced by a comma in the program's text output formatting routine. Here's an example of one record from the database: AA7BQ ,LLOYD,,FRED L,,53340,90009,00009,8215 E WOOD DR,SCOTTSDALE,AZ, 85260,E,KJ6RK,A /* ** Standard Record Field Offsets */ #define Callsign 0 AA7BQ #define LastName 1 LLOYD #define JR 2 (reserved) * #define FirstName 3 FRED L #define MI 4 (reserved) * #define DateOfBirth 5 53340 // Dec 6, 1953 #define EffectiveDate 6 90009 // Jan 9, 1990 #define ExpirationDate 7 00009 // Jan 9, 2000 #define MailStreet 8 8215 E. Wood DR #define MailCity 9 SCOTTSDALE #define MailState 10 AZ #define ZipCode 11 85260 #define LicenseClass 12 E // (P = TechPlus) #define PreviousCall 13 KJ6RK #define PreviousClass 14 A * The fields JR and MI were obsoleted by the FCC in July 1994. QRZ uses these fields by inserting a period (.) into them to indicate the presense of additional information elsewhere on the CDROM. In particular, QRZ inserts a period into the JR field when the indicated callsign has an email address registered in the email database. A period in the MI field is used to indicate that the CDROM contains a GIF image file for the indicated callsign. Gif files are found in \CALLBK\GIFS and are named .gif. Note: The email database is proprietary and is not documented or accessible to user-developed programs. Access routines are provided to C++ and Visual Basic users through the QRZ DLL's. Callsign Collating Sequence Callsigns in the QRZ databases are stored in a special columnar format which aids in performing searches. With this format, the area digit part of the callsign is always in the same position. Callsigns are considered to have a prefix, an area number and a suffix. Collating preference is always given in reverse order, that is, suffix followed by area number followed by prefix. When callsigns are compared for sorting and searching, a this collating sequence (called 'defcab') is applied to the callsign which results in the following logical behavior: abcdef sort order reason -------- ---------- --------- "KB3A " 1st def "KB2AB " 2nd defc "K 5AB " 3rd def "KB1ABC" 4th defc "K 4ABC" 5th defca "WA4ABC" 6th defcab "WB4ABC" 7th The 'reason' lists why each entry deserves its position in the list above the one below it. To compare two callsigns for greater than, less than or equality, the program must first transpose them into 'defcab' format (using spaces for unused positions) and then do a left-to-right comparison of the two. For example, to compare K1ABC against KC8AB, the program would do the following: defcab callsign K1ABC is transposed to: "ABC1K " callsign KB8AB is transposed to: "AB 8KC" then, a string compare as in: strcmp("ABC1K ", "AB 8KB") will return a "greater than" value meaning that K1ABC comes _after_ KB8AB in the database having been found greater at point 'f' in the defcab sequence. Date Formats All dates are stored in 5 character Julian format, e.g. 93003 equals January 3, 1993 or the 3rd day of 1993. Dates before 1900 or after year 2000 must be determined by the context in which they are used. In other words, if the resultant age does not make sense, then it is wrong. For example, all licenses expire in the future so for license expiration dates 02 must mean 2002. Birthdays are more difficult to judge but most can be arbitrarily considered to be greater than 10 years old. This is not a perfect method, but it does yield satisfactory overall results. Cross Reference Information When the FCC supplies a "previous callsign" in their database, it is used by QRZ to construct a cross reference so that a person can be found by their old call as well as their new one. A cross reference record is distinguished from other records as one which contains only one comma. A cross-reference record takes the form of "OldCall,NewCall" with no other information on the line. When a cross reference record is encountered, your program must fetch the second field and restart the search from the beginning to return the primary reference. Summary It is the desire of QRZ to maintain this database and index format on all future releases of the QRZ CDROM. It would be nice to find a method for extending the data to include Vanity callsign status while preserving compatibility with older programs. We are currently considering appending a dot (.) to the license class (e.g. "E.") if the callsign was issued under the Vanity program. Your comments are welcome on this idea. Please address programming questions and/or comments to: flloyd@qrz.com ------------------------------------------------------------------- Fred Lloyd, AA7BQ 10/26/96